ssl: fix handshake hang that manifested on OS X.#2624
ssl: fix handshake hang that manifested on OS X.#2624htuch merged 4 commits intoenvoyproxy:masterfrom
Conversation
When a handshake completed in transport socket doRead(), we didn't check to see if there were any pending bytes in the write buffer in Network::ConnectionImpl. This showed up in envoyproxy#2598, where ads_integration_test would hang while waiting for the server to connect to the fake management upstream. Risk Level: Medium (messing around with connection handshaking can do bad things). Testing: Additional unit test for ConnectionImpl changes. Validated ads_integration_test now passes on OS X. Signed-off-by: Harvey Tuch <htuch@google.com>
| // completion, which may also have completed in the context of onReadReady(), | ||
| // where no check of the write buffer is made. Provide an opportunity to flush | ||
| // here. If connection write is not ready, this is harmless. | ||
| if (event == ConnectionEvent::Connected && write_buffer_->length() > 0) { |
There was a problem hiding this comment.
Should this happen before notifying callbacks?
If we leave this after the above loop, should check whether the connection was closed before doing this.
| Buffer::OwnedImpl buffer(val); | ||
| EXPECT_CALL(*file_event_, activate(Event::FileReadyType::Write)).WillOnce(Invoke(file_ready_cb_)); | ||
| EXPECT_CALL(*transport_socket_, doWrite(BufferStringEqual(val), false)) | ||
| .WillOnce(Return(IoResult{PostIoAction::KeepOpen, 100, false})); |
There was a problem hiding this comment.
I think you should return 0 instead of 100; otherwise you're claiming that 100 bytes were written, but this test scenario is saying that it cannot be written yet (and 100 bytes is more than the size of the write, so it's weird).
| // in the write buffer, we should see a doWrite with this data. | ||
| EXPECT_CALL(*transport_socket_, doRead(_)).WillOnce(InvokeWithoutArgs([this] { | ||
| transport_socket_callbacks_->raiseEvent(Network::ConnectionEvent::Connected); | ||
| return IoResult{PostIoAction::KeepOpen, 100, false}; |
There was a problem hiding this comment.
I think if you're going to return 100, you should put 100 bytes in the buffer. Otherwise it's an API violation.
| })); | ||
| EXPECT_CALL(*transport_socket_, doWrite(BufferStringEqual(val), false)) | ||
| .WillOnce(Return(IoResult{PostIoAction::KeepOpen, 100, false})); | ||
| EXPECT_CALL(*transport_socket_, doWrite(_, true)) |
There was a problem hiding this comment.
What causes this call? Is this from shutdown of the test? If so, let's put this EXPECT after the file_ready_cb_ below.
Signed-off-by: Harvey Tuch <htuch@google.com>
| // where no check of the write buffer is made. Provide an opportunity to flush | ||
| // here. If connection write is not ready, this is harmless. | ||
| if (event == ConnectionEvent::Connected && write_buffer_->length() > 0) { | ||
| onWriteReady(); |
There was a problem hiding this comment.
onWriteReady() can close the socket. Need to check that before continuing. Although, if that happens, filters will get notified of a close before being notified that it was connected.
Hmmm... Thinking this through, it is probably better to notify all filters that they're connected (ie put it back to the way it was before), and make this case be:
if (state() == State::Open && event == ConnectionEvent::Connected && write_buffer_->length() > 0)
There was a problem hiding this comment.
Sorry to go in circles on this; ConnectionImpl is complicated.
There was a problem hiding this comment.
Sure, this is an area of the code I'm less familiar with, and it's hard to reason about what callbacks can and can't do here, so thanks for the patience.
Signed-off-by: Harvey Tuch <htuch@google.com>
ggreenway
left a comment
There was a problem hiding this comment.
1 more small change, then I think this is good to go
| Buffer::OwnedImpl buffer(val); | ||
| EXPECT_CALL(*file_event_, activate(Event::FileReadyType::Write)).WillOnce(Invoke(file_ready_cb_)); | ||
| EXPECT_CALL(*transport_socket_, doWrite(BufferStringEqual(val), false)) | ||
| .WillOnce(Return(IoResult{PostIoAction::KeepOpen, 100, false})); |
Signed-off-by: Harvey Tuch <htuch@google.com>
| // where no check of the write buffer is made. Provide an opportunity to flush | ||
| // here. If connection write is not ready, this is harmless. We should only do | ||
| // this if we're still open (the above callbacks may have closed). | ||
| if (state() == State::Open && event == ConnectionEvent::Connected && |
There was a problem hiding this comment.
@htuch @ggreenway I'm late here, but I think it's slightly strange that in the plaintext case we effectively recurse into onWriteReady() which already handles this case by trying to write after a connection happens. Did we consider any other variant of this change? It's not a big deal just curious.
There was a problem hiding this comment.
That is weird; I didn't catch that. No, we didn't try anything else. We should look into if there's a better way to handle this. It seems to work, but it is strange, and hard to reason about.
There was a problem hiding this comment.
It works because we flip connecting_ status before re-entry. I agree this is harder to reason about than would be ideal. Should we just return unconditionally in raw buffer socket onWriteReady() following a new connection and defer to the new write check?
There was a problem hiding this comment.
@htuch agreed the code works as is. To make it clearer yes I would either return unconditionally with a comment here https://github.com/envoyproxy/envoy/blob/master/source/common/network/connection_impl.cc#L477 or another option is to factor out the 2nd part of onWriteReady() into a separate function and call that in the SSL case when handshake completes during a read callback. I think the former is simpler IMO.
* feat(stats): support grpc status codes in metrics * wip * add tests and fix up context * set empty grpc_response_code * use latest envoyproxy/envoy-wasm Signed-off-by: Douglas Reid <douglas-reid@users.noreply.github.com> * add license/copyright banner Signed-off-by: Douglas Reid <douglas-reid@users.noreply.github.com> * fix lint / format / malign issues Signed-off-by: Douglas Reid <douglas-reid@users.noreply.github.com> * fix up alpn_test.cc Signed-off-by: Douglas Reid <douglas-reid@users.noreply.github.com> * fix lint Signed-off-by: Douglas Reid <douglas-reid@users.noreply.github.com> * more tests needed updating with envoy update Signed-off-by: Douglas Reid <douglas-reid@users.noreply.github.com> * stackdriver fix Signed-off-by: Douglas Reid <douglas-reid@users.noreply.github.com> * fix stackdriver onConfigure Signed-off-by: Douglas Reid <douglas-reid@users.noreply.github.com> * remove unused using clause Signed-off-by: Douglas Reid <douglas-reid@users.noreply.github.com> * clang-format Signed-off-by: Douglas Reid <douglas-reid@users.noreply.github.com>
Signed-off-by: JP Simard <jp@jpsim.com>
Signed-off-by: JP Simard <jp@jpsim.com>
When a handshake completed in transport socket doRead(), we didn't check
to see if there were any pending bytes in the write buffer in
Network::ConnectionImpl. This showed up in #2598, where
ads_integration_test would hang while waiting for the server to connect
to the fake management upstream.
Risk Level: Medium (messing around with connection handshaking can do
bad things).
Testing: Additional unit test for ConnectionImpl changes. Validated
ads_integration_test now passes on OS X.
Signed-off-by: Harvey Tuch htuch@google.com